Skip to content

Conversation

jwu10003
Copy link

[DAGCombiner] Preserve debug location of original load in fold (conv (load x))

This patch fixes a debug information loss issue during the combine of a conversion (e.g., bitcast) with a load into a new load: fold (conv (load x)) -> (load (conv*)x).

The newly created load node was incorrectly using the debug location (SDLoc) of the conversion operation (the conv node, N) instead of the location of the original load operation (the load node, LN0). The location of the conversion operation often points to compiler-internal instructions and provides little value for source-level debugging. In contrast, the original load's location accurately represents the source of the data access in the user's code.

This change ensures the new load inherits the debug location from LN0 by using SDLoc(LN0), which improves debugging experience and fixes a test case failure observed in the Triton compiler.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added backend:AMDGPU llvm:SelectionDAG SelectionDAGISel as well labels Sep 23, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 23, 2025

@llvm/pr-subscribers-debuginfo
@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-amdgpu

Author: None (jwu10003)

Changes

[DAGCombiner] Preserve debug location of original load in fold (conv (load x))

This patch fixes a debug information loss issue during the combine of a conversion (e.g., bitcast) with a load into a new load: fold (conv (load x)) -> (load (conv*)x).

The newly created load node was incorrectly using the debug location (SDLoc) of the conversion operation (the conv node, N) instead of the location of the original load operation (the load node, LN0). The location of the conversion operation often points to compiler-internal instructions and provides little value for source-level debugging. In contrast, the original load's location accurately represents the source of the data access in the user's code.

This change ensures the new load inherits the debug location from LN0 by using SDLoc(LN0), which improves debugging experience and fixes a test case failure observed in the Triton compiler.


Full diff: https://github.com/llvm/llvm-project/pull/160236.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+1-1)
  • (added) llvm/test/CodeGen/AMDGPU/combine-conv-load.ll (+41)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index a6ba6e518899f..4cb0a35aa7b25 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -16703,7 +16703,7 @@ SDValue DAGCombiner::visitBITCAST(SDNode *N) {
         }
       }
       SDValue Load =
-          DAG.getLoad(VT, SDLoc(N), LN0->getChain(), LN0->getBasePtr(),
+          DAG.getLoad(VT, SDLoc(LN0), LN0->getChain(), LN0->getBasePtr(),
                       LN0->getMemOperand());
       DAG.ReplaceAllUsesOfValueWith(N0.getValue(1), Load.getValue(1));
       return Load;
diff --git a/llvm/test/CodeGen/AMDGPU/combine-conv-load.ll b/llvm/test/CodeGen/AMDGPU/combine-conv-load.ll
new file mode 100644
index 0000000000000..900c973b712ae
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/combine-conv-load.ll
@@ -0,0 +1,41 @@
+; RUN: llc -mtriple=amdgcn -mcpu=gfx942  < %s | FileCheck %s
+
+; CHECK-LABEL:  test:
+; CHECK:        .loc    1 8 16                          ; test.py:8:16
+; CHECK-NEXT:   s_load_dword
+
+; Function Attrs: alwaysinline mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite)
+define amdgpu_kernel void @test(ptr addrspace(1) inreg readonly captures(none) %0, ptr addrspace(1) inreg writeonly captures(none) %1, ptr addrspace(1) inreg readnone captures(none) %2, ptr addrspace(1) inreg readnone captures(none) %3) local_unnamed_addr #0 !dbg !4 {
+  %5 = tail call i32 @llvm.amdgcn.workitem.id.x(), !dbg !7
+  %6 = and i32 %5, 255, !dbg !7
+  %7 = icmp eq i32 %6, 0, !dbg !7
+  br i1 %7, label %8, label %10, !dbg !7
+
+8:                                                ; preds = %4
+  %9 = load <1 x float>, ptr addrspace(1) %0, align 4, !dbg !8, !amdgpu.noclobber !6
+  store <1 x float> %9, ptr addrspace(1) %1, align 4, !dbg !7
+  br label %10, !dbg !7
+
+10:                                               ; preds = %8, %4
+  ret void, !dbg !9
+}
+
+; Function Attrs: alwaysinline nocallback nofree nosync nounwind speculatable willreturn memory(none)
+declare noundef range(i32 0, 1024) i32 @llvm.amdgcn.workitem.id.x() #1
+
+attributes #0 = { alwaysinline mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite) "amdgpu-agpr-alloc"="0" "amdgpu-flat-work-group-size"="1,256" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="1,1" "denormal-fp-math-f32"="ieee" "uniform-work-group-size"="false" }
+attributes #1 = { alwaysinline nocallback nofree nosync nounwind speculatable willreturn memory(none) }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "triton", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
+!1 = !DIFile(filename: "test.py", directory: "/path")
+!2 = !{i32 2, !"Debug Info Version", i32 3}
+!3 = !{i32 1, !"amdhsa_code_object_version", i32 500}
+!4 = distinct !DISubprogram(name: "test", linkageName: "test", scope: !1, file: !1, line: 7, type: !5, scopeLine: 7, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0)
+!5 = !DISubroutineType(cc: DW_CC_normal, types: !6)
+!6 = !{}
+!7 = !DILocation(line: 9, column: 20, scope: !4)
+!8 = !DILocation(line: 8, column: 16, scope: !4)
+!9 = !DILocation(line: 9, column: 4, scope: !4)

@@ -16703,7 +16703,7 @@ SDValue DAGCombiner::visitBITCAST(SDNode *N) {
}
}
SDValue Load =
DAG.getLoad(VT, SDLoc(N), LN0->getChain(), LN0->getBasePtr(),
DAG.getLoad(VT, SDLoc(LN0), LN0->getChain(), LN0->getBasePtr(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would getMergedLocation work better?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would getMergedLocation work better?

I did some experiments:
LN0 loc = test.py:8:16
N loc = test.py:9:20
=>
getMergedLocation(LN0, N) = test.py:0

Is this as expected?

The code is as follows. If there are any mistakes, please point them out to me.
image

However, the test needs to check test.py:8:16.


; Function Attrs: alwaysinline mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite)
define amdgpu_kernel void @test(ptr addrspace(1) inreg readonly captures(none) %0, ptr addrspace(1) inreg writeonly captures(none) %1, ptr addrspace(1) inreg readnone captures(none) %2, ptr addrspace(1) inreg readnone captures(none) %3) local_unnamed_addr #0 !dbg !4 {
%5 = tail call i32 @llvm.amdgcn.workitem.id.x(), !dbg !7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use named values in tests

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete redundant IR, retaining only the necessary load and store operations.

store <1 x float> %9, ptr addrspace(1) %1, align 4, !dbg !7
br label %10, !dbg !7

10: ; preds = %8, %4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need multiple blocks?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need multiple blocks?

done

@@ -0,0 +1,41 @@
; RUN: llc -mtriple=amdgcn -mcpu=gfx942 < %s | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; RUN: llc -mtriple=amdgcn -mcpu=gfx942 < %s | FileCheck %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx942 < %s | FileCheck %s

Test belongs in test/DebugInfo

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test belongs in test/DebugInfo

done

; CHECK: .loc 1 8 16 ; test.py:8:16
; CHECK-NEXT: s_load_dword

; Function Attrs: alwaysinline mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; Function Attrs: alwaysinline mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 26 to 27
attributes #0 = { alwaysinline mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite) "amdgpu-agpr-alloc"="0" "amdgpu-flat-work-group-size"="1,256" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="1,1" "denormal-fp-math-f32"="ieee" "uniform-work-group-size"="false" }
attributes #1 = { alwaysinline nocallback nofree nosync nounwind speculatable willreturn memory(none) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can drop all of this

Suggested change
attributes #0 = { alwaysinline mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite) "amdgpu-agpr-alloc"="0" "amdgpu-flat-work-group-size"="1,256" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="1,1" "denormal-fp-math-f32"="ieee" "uniform-work-group-size"="false" }
attributes #1 = { alwaysinline nocallback nofree nosync nounwind speculatable willreturn memory(none) }
attributes #0 = { nounwind }
attributes #1 = { alwaysinline nocallback nofree nosync nounwind speculatable willreturn memory(none) }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can drop all of this

done

define void @test(ptr addrspace(1) inreg readonly captures(none) %0, ptr addrspace(1) inreg writeonly captures(none) %1) local_unnamed_addr !dbg !4 {
%3 = load <1 x float>, ptr addrspace(1) %0, align 4, !dbg !8, !amdgpu.noclobber !6
store <1 x float> %3, ptr addrspace(1) %1, align 4, !dbg !7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 8 to 9
%3 = load <1 x float>, ptr addrspace(1) %0, align 4, !dbg !8, !amdgpu.noclobber !6
store <1 x float> %3, ptr addrspace(1) %1, align 4, !dbg !7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%3 = load <1 x float>, ptr addrspace(1) %0, align 4, !dbg !8, !amdgpu.noclobber !6
store <1 x float> %3, ptr addrspace(1) %1, align 4, !dbg !7
%ld = load <1 x float>, ptr addrspace(1) %arg0, align 4, !dbg !8, !amdgpu.noclobber !6
store <1 x float> %ld, ptr addrspace(1) %arg1, align 4, !dbg !7

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

; CHECK: .loc 1 8 16 prologue_end ; test.py:8:16
; CHECK-NEXT: s_load_dword

define void @test(ptr addrspace(1) inreg readonly captures(none) %0, ptr addrspace(1) inreg writeonly captures(none) %1) local_unnamed_addr !dbg !4 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
define void @test(ptr addrspace(1) inreg readonly captures(none) %0, ptr addrspace(1) inreg writeonly captures(none) %1) local_unnamed_addr !dbg !4 {
define void @test(ptr addrspace(1) inreg readonly captures(none) %arg0, ptr addrspace(1) inreg writeonly captures(none) %arg1) local_unnamed_addr !dbg !4 {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

…(load x))

This patch fixes a debug information loss issue during the combine of a conversion (e.g., bitcast)
with a load into a new load: `fold (conv (load x)) -> (load (conv*)x)`.

The newly created load node was incorrectly using the debug location (`SDLoc`) of the conversion
operation (the `conv` node, `N`) instead of the location of the original load operation (the `load`
node, `LN0`). The location of the conversion operation often points to compiler-internal
instructions and provides little value for source-level debugging. In contrast, the original load's
location accurately represents the source of the data access in the user's code.

This change ensures the new load inherits the debug location from `LN0` by using `SDLoc(LN0)`,
which improves debugging experience and fixes a test case failure observed in the Triton compiler.
@jwu10003 jwu10003 force-pushed the jian.wu.fix_debugging_info branch from 13de15a to e647079 Compare September 23, 2025 11:42
@jwu10003
Copy link
Author

@arsenm do you know how to execute the "Build and Test Linux" and "Build and Test Windows" ?

@arsenm arsenm enabled auto-merge (squash) September 23, 2025 12:00
Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions cpp -- llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 4cb0a35aa..d80f20317 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -16702,9 +16702,8 @@ SDValue DAGCombiner::visitBITCAST(SDNode *N) {
           LN0->getMemOperand()->clearRanges();
         }
       }
-      SDValue Load =
-          DAG.getLoad(VT, SDLoc(LN0), LN0->getChain(), LN0->getBasePtr(),
-                      LN0->getMemOperand());
+      SDValue Load = DAG.getLoad(VT, SDLoc(LN0), LN0->getChain(),
+                                 LN0->getBasePtr(), LN0->getMemOperand());
       DAG.ReplaceAllUsesOfValueWith(N0.getValue(1), Load.getValue(1));
       return Load;
     }

@arsenm arsenm disabled auto-merge September 23, 2025 12:06
@jwu10003 jwu10003 changed the title [DAGCombiner] Preserve debug location of original load in fold (conv (load x)) [Draft][DAGCombiner] Preserve debug location of original load in fold (conv (load x)) Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants